2023-07-27 |
How to Train Your YouTube Recommender |
Alexander Liu et.al. |
2307.14551v1 |
null |
YouTube provides features for users to indicate disinterest when presented with unwanted recommendations, such as the Not interested'' and Don\'t recommend channel'' buttons. These buttons are purported to allow the user to correct mistakes'' made by the recommendation system. Yet, relatively little is known about the empirical efficacy of these buttons. Neither is much known about users' awareness of and confidence in them. To address these gaps, we simulated YouTube users with sock puppet agents. Each agent first executed a stain phase'', where it watched many videos of one assigned topic; then it executed a scrub phase'', where it tried to remove recommendations of the assigned topic. Each agent repeatedly applied a single scrubbing strategy, which included disliking previously-watched videos or deleting them from watch history, as well as clicking the not interested'' or don\'t recommend channel'' button on newly-recommended videos. Overall, we found that the stain phase significantly increased the fraction of the recommended videos on the user\'s homepage dedicated to the assigned topic. For the scrub phase, using the Not interested'' button worked best, significantly reducing such recommendations in all topics tested, on average removing 88\% of them. Neither the stain phase nor the scrub phase, however, had much effect on videopage recommendations (those given to users while they watch a video). We also ran a survey ($N$ =300) asking adult YouTube users in the US whether they were aware of and used these buttons before, as well as how effective they found these buttons to be. We found that 44\% of participants were not aware that the ``Not interested'' button existed. However, those who were aware of this button often used it to remove unwanted recommendations (82.8\%) and found it to be modestly effective (3.42 out of 5). |
2023-07-25 |
Insights into Cognitive Engagement: Comparing the Effectiveness of Game-Based and Video-Based Learning |
Shayla Sharmin et.al. |
2307.13637v1 |
null |
The analysis of brain signals holds considerable importance in enhancing our comprehension of diverse learning techniques and cognitive mechanisms. Game-based learning is increasingly being recognized for its interactive and engaging educational approach. A pilot study of twelve participants divided into experimental and control groups was conducted to understand its effects on cognitive processes. Both groups were provided with the same contents regarding the basic structure of the graph. The participants in the experimental group engaged in a quiz-based game, while those in the control group watched a pre-recorded video. Functional Near-Infrared Spectroscopy (fNIRS) was employed to acquire cerebral signals, and a series of pre and post-tests were administered. The findings of our study indicate that the group engaged in the game activity displayed elevated levels of oxygenated hemoglobin compared to the group involved in watching videos. Conversely, the deoxygenated hemoglobin levels remained relatively consistent across both groups throughout the learning process. The aforementioned findings suggest that the use of game-based learning has a substantial influence on cognitive processes. Furthermore, it is evident that both the game and video groups exhibited higher neural activity in the Lateral Prefrontal cortex (PFC). The oxygenated hemoglobin ratio demonstrates that the game group had 2.33 times more neural processing in the Lateral PFC than the video group. This data is further supported by the knowledge gain analysis, which indicates that the game-based approach resulted in a 47.74% higher knowledge gain than the video group, as calculated from the difference in pre-and post-test scores. |
2023-07-25 |
A Pairwise Dataset for GUI Conversion and Retrieval between Android Phones and Tablets |
Han Hu et.al. |
2307.13225v1 |
null |
With the popularity of smartphones and tablets, users have become accustomed to using different devices for different tasks, such as using their phones to play games and tablets to watch movies. To conquer the market, one app is often available on both smartphones and tablets. However, although one app has similar graphic user interfaces (GUIs) and functionalities on phone and tablet, current app developers typically start from scratch when developing a tablet-compatible version of their app, which drives up development costs and wastes existing design resources. Researchers are attempting to employ deep learning in automated GUIs development to enhance developers' productivity. Deep learning models rely heavily on high-quality datasets. There are currently several publicly accessible GUI page datasets for phones, but none for pairwise GUIs between phones and tablets. This poses a significant barrier to the employment of deep learning in automated GUI development. In this paper, we collect and make public the Papt dataset, which is a pairwise dataset for GUI conversion and retrieval between Android phones and tablets. The dataset contains 10,035 phone-tablet GUI page pairs from 5,593 phone-tablet app pairs. We illustrate the approaches of collecting pairwise data and statistical analysis of this dataset. We also illustrate the advantages of our dataset compared to other current datasets. Through preliminary experiments on this dataset, we analyse the present challenges of utilising deep learning in automated GUI development and find that our dataset can assist the application of some deep learning models to tasks involving automatic GUI development. |
2023-07-24 |
A Connection between One-Step Regularization and Critic Regularization in Reinforcement Learning |
Benjamin Eysenbach et.al. |
2307.12968v1 |
link |
As with any machine learning problem with limited data, effective offline RL algorithms require careful regularization to avoid overfitting. One-step methods perform regularization by doing just a single step of policy improvement, while critic regularization methods do many steps of policy improvement with a regularized objective. These methods appear distinct. One-step methods, such as advantage-weighted regression and conditional behavioral cloning, truncate policy iteration after just one step. This ``early stopping'' makes one-step RL simple and stable, but can limit its asymptotic performance. Critic regularization typically requires more compute but has appealing lower-bound guarantees. In this paper, we draw a close connection between these methods: applying a multi-step critic regularization method with a regularization coefficient of 1 yields the same policy as one-step RL. While practical implementations violate our assumptions and critic regularization is typically applied with smaller regularization coefficients, our experiments nevertheless show that our analysis makes accurate, testable predictions about practical offline RL methods (CQL and one-step RL) with commonly-used hyperparameters. Our results that every problem can be solved with a single step of policy improvement, but rather that one-step RL might be competitive with critic regularization on RL problems that demand strong regularization. |
2023-07-24 |
Rechargeable Li/Cl$_2$ battery down to -80 °C |
Peng Liang et.al. |
2307.12947v1 |
null |
Low temperature rechargeable batteries are important to life in cold climates, polar/deep-sea expeditions and space explorations. Here, we report ~ 3.5 - 4 V rechargeable lithium/chlorine (Li/Cl2) batteries operating down to -80 {\deg}C, employing Li metal negative electrode, a novel CO2 activated porous carbon (KJCO2) as the positive electrode, and a high ionic conductivity (~ 5 to 20 mS cm-1 from -80 {\deg}C to 25 {\deg}C) electrolyte comprised of 1 M aluminum chloride (AlCl3), 0.95 M lithium chloride (LiCl), and 0.05 M lithium bis(fluorosulfonyl)imide (LiFSI) in low melting point (-104.5 {\deg}C) thionyl chloride (SOCl2). Between room-temperature and -80 {\deg}C, the Li/Cl2 battery delivered up to ~ 30,000 - 4,500 mAh g-1 first discharge capacity and a 1,200 - 5,000 mAh g-1 reversible capacity (discharge voltages in ~ 3.5 to 3.1 V) over up to 130 charge-discharge cycles. Mass spectrometry and X-ray photoelectron spectroscopy (XPS) probed Cl2 trapped in the porous carbon upon LiCl electro-oxidation during charging. At lower temperature down to -80 {\deg}C, SCl2/S2Cl2 and Cl2 generated by electro-oxidation in the charging step were trapped in porous KJCO2 carbon, allowing for reversible reduction to afford a high discharge voltage plateau near ~ 4 V with up to ~ 1000 mAh g-1 capacity for SCl2/S2Cl2 reduction and up to ~ 4000 mAh g-1 capacity at ~ 3.1 V plateau for Cl2 reduction. Towards practical use, we made CR2032 Li/Cl2 battery cells to drive digital watches at -40 {\deg}C and light emitting diode at -80 {\deg}C, opening Li/Cl2 secondary batteries for ultra-cold conditions. |
2023-07-24 |
Less is More: Focus Attention for Efficient DETR |
Dehua Zheng et.al. |
2307.12612v1 |
link |
DETR-like models have significantly boosted the performance of detectors and even outperformed classical convolutional models. However, all tokens are treated equally without discrimination brings a redundant computational burden in the traditional encoder structure. The recent sparsification strategies exploit a subset of informative tokens to reduce attention complexity maintaining performance through the sparse encoder. But these methods tend to rely on unreliable model statistics. Moreover, simply reducing the token population hinders the detection performance to a large extent, limiting the application of these sparse models. We propose Focus-DETR, which focuses attention on more informative tokens for a better trade-off between computation efficiency and model accuracy. Specifically, we reconstruct the encoder with dual attention, which includes a token scoring mechanism that considers both localization and category semantic information of the objects from multi-scale feature maps. We efficiently abandon the background queries and enhance the semantic interaction of the fine-grained object queries based on the scores. Compared with the state-of-the-art sparse DETR-like detectors under the same setting, our Focus-DETR gets comparable complexity while achieving 50.4AP (+2.2) on COCO. The code is available at https://github.com/huawei-noah/noah-research/tree/master/Focus-DETR and https://gitee.com/mindspore/models/tree/master/research/cv/Focus-DETR. |
2023-07-24 |
Multi-Shooting Differential Dynamic Programming for Hybrid Systems using Analytical Derivatives |
Shubham Singh et.al. |
2307.12606v1 |
null |
Differential Dynamic Programming (DDP) is a popular technique used to generate motion for dynamic-legged robots in the recent past. However, in most cases, only the first-order partial derivatives of the underlying dynamics are used, resulting in the iLQR approach. Neglecting the second-order terms often slows down the convergence rate compared to full DDP. Multi-Shooting is another popular technique to improve robustness, especially if the dynamics are highly non-linear. In this work, we consider Multi-Shooting DDP for trajectory optimization of a bounding gait for a simplified quadruped model. As the main contribution, we develop Second-Order analytical partial derivatives of the rigid-body contact dynamics, extending our previous results for fixed/floating base models with multi-DoF joints. Finally, we show the benefits of a novel Quasi-Newton method for approximating second-order derivatives of the dynamics, leading to order-of-magnitude speedups in the convergence compared to the full DDP method. |
2023-07-24 |
Automated Mapping of Adaptive App GUIs from Phones to TVs |
Han Hu et.al. |
2307.12522v1 |
null |
With the increasing interconnection of smart devices, users often desire to adopt the same app on quite different devices for identical tasks, such as watching the same movies on both their smartphones and TV. However, the significant differences in screen size, aspect ratio, and interaction styles make it challenging to adapt Graphical User Interfaces (GUIs) across these devices. Although there are millions of apps available on Google Play, only a few thousand are designed to support smart TV displays. Existing techniques to map a mobile app GUI to a TV either adopt a responsive design, which struggles to bridge the substantial gap between phone and TV or use mirror apps for improved video display, which requires hardware support and extra engineering efforts. Instead of developing another app for supporting TVs, we propose a semi-automated approach to generate corresponding adaptive TV GUIs, given the phone GUIs as the input. Based on our empirical study of GUI pairs for TV and phone in existing apps, we synthesize a list of rules for grouping and classifying phone GUIs, converting them to TV GUIs, and generating dynamic TV layouts and source code for the TV display. Our tool is not only beneficial to developers but also to GUI designers, who can further customize the generated GUIs for their TV app development. An evaluation and user study demonstrate the accuracy of our generated GUIs and the usefulness of our tool. |
2023-07-23 |
LiveRetro: Visual Analytics for Strategic Retrospect in Livestream E-Commerce |
Yuchen Wu et.al. |
2307.12213v1 |
null |
Livestream e-commerce integrates live streaming and online shopping, allowing viewers to make purchases while watching. However, effective marketing strategies remain a challenge due to limited empirical research and subjective biases from the absence of quantitative data. Current tools fail to capture the interdependence between live performances and feedback. This study identified computational features, formulated design requirements, and developed LiveRetro, an interactive visual analytics system. It enables comprehensive retrospective analysis of livestream e-commerce for streamers, viewers, and merchandise. LiveRetro employs enhanced visualization and time-series forecasting models to align performance features and feedback, identifying influences at channel, merchandise, feature, and segment levels. Through case studies and expert interviews, the system provides deep insights into the relationship between live performance and streaming statistics, enabling efficient strategic analysis from multiple perspectives. |
2023-07-21 |
Large Language Model-based System to Provide Immediate Feedback to Students in Flipped Classroom Preparation Learning |
Shintaro Uchiyama et.al. |
2307.11388v1 |
null |
This paper proposes a system that uses large language models to provide immediate feedback to students in flipped classroom preparation learning. This study aimed to solve challenges in the flipped classroom model, such as ensuring that students are emotionally engaged and motivated to learn. Students often have questions about the content of lecture videos in the preparation of flipped classrooms, but it is difficult for teachers to answer them immediately. The proposed system was developed using the ChatGPT API on a video-watching support system for preparation learning that is being used in real practice. Answers from ChatGPT often do not align with the context of the student's question. Therefore, this paper also proposes a method to align the answer with the context. This paper also proposes a method to collect the teacher's answers to the students' questions and use them as additional guides for the students. This paper discusses the design and implementation of the proposed system. |
2023-07-21 |
OpenGDA: Graph Domain Adaptation Benchmark for Cross-network Learning |
Boshen Shi et.al. |
2307.11341v1 |
link |
Graph domain adaptation models are widely adopted in cross-network learning tasks, with the aim of transferring labeling or structural knowledge. Currently, there mainly exist two limitations in evaluating graph domain adaptation models. On one side, they are primarily tested for the specific cross-network node classification task, leaving tasks at edge-level and graph-level largely under-explored. Moreover, they are primarily tested in limited scenarios, such as social networks or citation networks, lacking validation of model's capability in richer scenarios. As comprehensively assessing models could enhance model practicality in real-world applications, we propose a benchmark, known as OpenGDA. It provides abundant pre-processed and unified datasets for different types of tasks (node, edge, graph). They originate from diverse scenarios, covering web information systems, urban systems and natural systems. Furthermore, it integrates state-of-the-art models with standardized and end-to-end pipelines. Overall, OpenGDA provides a user-friendly, scalable and reproducible benchmark for evaluating graph domain adaptation models. The benchmark experiments highlight the challenges of applying GDA models to real-world applications with consistent good performance, and potentially provide insights to future research. As an emerging project, OpenGDA will be regularly updated with new datasets and models. It could be accessed from https://github.com/Skyorca/OpenGDA. |
2023-07-21 |
Fused Spectatorship: Designing Bodily Experiences Where Spectators Become Players |
Rakesh Patibanda et.al. |
2307.11297v1 |
null |
Spectating digital games can be exciting. However, due to its vicarious nature, spectators often wish to engage in the gameplay beyond just watching and cheering. To blur the boundaries between spectators and players, we propose a novel approach called ''Fused Spectatorship'', where spectators watch their hands play games by loaning bodily control to a computational Electrical Muscle Stimulation (EMS) system. To showcase this concept, we designed three games where spectators loan control over both their hands to the EMS system and watch them play these competitive and collaborative games. A study with 12 participants suggested that participants could not distinguish if they were watching their hands play, or if they were playing the games themselves. We used our results to articulate four spectator experience themes and four fused spectator types, the behaviours they elicited and offer one design consideration to support each of these behaviours. We also discuss the ethical design considerations of our approach to help game designers create future fused spectatorship experiences. |
2023-07-20 |
Underwater 3D positioning on smart devices |
Tuochao Chen et.al. |
2307.11263v1 |
null |
The emergence of water-proof mobile and wearable devices (e.g., Garmin Descent and Apple Watch Ultra) designed for underwater activities like professional scuba diving, opens up opportunities for underwater networking and localization capabilities on these devices. Here, we present the first underwater acoustic positioning system for smart devices. Unlike conventional systems that use floating buoys as anchors at known locations, we design a system where a dive leader can compute the relative positions of all other divers, without any external infrastructure. Our intuition is that in a well-connected network of devices, if we compute the pairwise distances, we can determine the shape of the network topology. By incorporating orientation information about a single diver who is in the visual range of the leader device, we can then estimate the positions of all the remaining divers, even if they are not within sight. We address various practical problems including detecting erroneous distance estimates, addressing rotational and flipping ambiguities as well as designing a distributed timestamp protocol that scales linearly with the number of devices. Our evaluations show that our distributed system running on underwater deployments of 4-5 commodity smart devices can perform pairwise ranging and localization with median errors of 0.5-0.9 m and 0.9-1.6 m |
2023-07-20 |
Kick Back & Relax: Learning to Reconstruct the World by Watching SlowTV |
Jaime Spencer et.al. |
2307.10713v1 |
link |
Self-supervised monocular depth estimation (SS-MDE) has the potential to scale to vast quantities of data. Unfortunately, existing approaches limit themselves to the automotive domain, resulting in models incapable of generalizing to complex environments such as natural or indoor settings. To address this, we propose a large-scale SlowTV dataset curated from YouTube, containing an order of magnitude more data than existing automotive datasets. SlowTV contains 1.7M images from a rich diversity of environments, such as worldwide seasonal hiking, scenic driving and scuba diving. Using this dataset, we train an SS-MDE model that provides zero-shot generalization to a large collection of indoor/outdoor datasets. The resulting model outperforms all existing SSL approaches and closes the gap on supervised SoTA, despite using a more efficient architecture. We additionally introduce a collection of best-practices to further maximize performance and zero-shot generalization. This includes 1) aspect ratio augmentation, 2) camera intrinsic estimation, 3) support frame randomization and 4) flexible motion estimation. Code is available at https://github.com/jspenmar/slowtv_monodepth. |
2023-07-19 |
Watch out Venomous Snake Species: A Solution to SnakeCLEF2023 |
Feiran Hu et.al. |
2307.09748v1 |
link |
The SnakeCLEF2023 competition aims to the development of advanced algorithms for snake species identification through the analysis of images and accompanying metadata. This paper presents a method leveraging utilization of both images and metadata. Modern CNN models and strong data augmentation are utilized to learn better representation of images. To relieve the challenge of long-tailed distribution, seesaw loss is utilized in our method. We also design a light model to calculate prior probabilities using metadata features extracted from CLIP in post processing stage. Besides, we attach more importance to venomous species by assigning venomous species labels to some examples that model is uncertain about. Our method achieves 91.31% score of the final metric combined of F1 and other metrics on private leaderboard, which is the 1st place among the participators. The code is available at https://github.com/xiaoxsparraw/CLEF2023. |
2023-07-18 |
GroupLane: End-to-End 3D Lane Detection with Channel-wise Grouping |
Zhuoling Li et.al. |
2307.09472v1 |
null |
Efficiency is quite important for 3D lane detection due to practical deployment demand. In this work, we propose a simple, fast, and end-to-end detector that still maintains high detection precision. Specifically, we devise a set of fully convolutional heads based on row-wise classification. In contrast to previous counterparts, ours supports recognizing both vertical and horizontal lanes. Besides, our method is the first one to perform row-wise classification in bird-eye-view. In the heads, we split feature into multiple groups and every group of feature corresponds to a lane instance. During training, the predictions are associated with lane labels using the proposed single-win one-to-one matching to compute loss, and no post-processing operation is demanded for inference. In this way, our proposed fully convolutional detector, GroupLane, realizes end-to-end detection like DETR. Evaluated on 3 real world 3D lane benchmarks, OpenLane, Once-3DLanes, and OpenLane-Huawei, GroupLane adopting ConvNext-Base as the backbone outperforms the published state-of-the-art PersFormer by 13.6% F1 score in the OpenLane validation set. Besides, GroupLane with ResNet18 still surpasses PersFormer by 4.9% F1 score, while the inference speed is nearly 7x faster and the FLOPs is only 13.3% of it. |
2023-07-17 |
Multi-Task Cross-Modality Attention-Fusion for 2D Object Detection |
Huawei Sun et.al. |
2307.08339v1 |
null |
Accurate and robust object detection is critical for autonomous driving. Image-based detectors face difficulties caused by low visibility in adverse weather conditions. Thus, radar-camera fusion is of particular interest but presents challenges in optimally fusing heterogeneous data sources. To approach this issue, we propose two new radar preprocessing techniques to better align radar and camera data. In addition, we introduce a Multi-Task Cross-Modality Attention-Fusion Network (MCAF-Net) for object detection, which includes two new fusion blocks. These allow for exploiting information from the feature maps more comprehensively. The proposed algorithm jointly detects objects and segments free space, which guides the model to focus on the more relevant part of the scene, namely, the occupied space. Our approach outperforms current state-of-the-art radar-camera fusion-based object detectors in the nuScenes dataset and achieves more robust results in adverse weather conditions and nighttime scenarios. |
2023-07-13 |
Probing the Galactic Halo with RR Lyrae Stars -- V. Chemistry, Kinematics, and Dynamically Tagged Groups |
Jonathan Cabrera Garcia et.al. |
2307.09572v1 |
null |
We employ a sample of 135,873 RR Lyrae stars (RRLs) with precise photometric-metallicity and distance estimates from the newly calibrated $P$--$\phi_{31}$--$R_{21}$--[Fe/H] and $Gaia$ $G$-band $P$--$R_{21}$--[Fe/H] absolute magnitude-metallicity relations of Li et al., combined with available proper motions from $Gaia$ EDR3, and 6955 systemic radial velocities from $Gaia$ DR3 and other sources, in order to explore the chemistry and kinematics of the halo of the Milky Way (MW). This sample is ideally suited for characterization of the inner- and outer-halo populations of the stellar halo, free from the bias associated with spectroscopically selected probes, and for estimation of their relative contributions as a function of Galactocentric distance. The results of a Gaussian Mixture-Model analysis of these contributions are broadly consistent with other observational studies of the halo, and with expectations from recent MW simulation studies. We apply the HDBSCAN clustering method to the specific energies and cylindrical actions ($E$, J${r}$, J$$, J$_{z}$), identifying 97 Dynamically Tagged Groups (DTGs) of RRLs, and explore their associations with recognized substructures of the MW. The precise photometric-distance determinations ($\delta\, d/d < 5$\%), and the resulting high-quality determination of dynamical parameters, yield highly statistically significant (low) dispersions of [Fe/H] for the stellar members of the DTGs compared to random draws from the full sample, indicating that they share common star-formation and chemical histories, influenced by their birth environments. |
2023-07-13 |
Watch Your Pose: Unsupervised Domain Adaption with Pose based Triplet Selection for Gait Recognition |
Gavriel Habib et.al. |
2307.06751v1 |
null |
Gait Recognition is a computer vision task aiming to identify people by their walking patterns. Existing methods show impressive results on individual datasets but lack the ability to generalize to unseen scenarios. Unsupervised Domain Adaptation (UDA) tries to adapt a model, pre-trained in a supervised manner on a source domain, to an unlabelled target domain. UDA for Gait Recognition is still in its infancy and existing works proposed solutions to limited scenarios. In this paper, we reveal a fundamental phenomenon in adaptation of gait recognition models, in which the target domain is biased to pose-based features rather than identity features, causing a significant performance drop in the identification task. We suggest Gait Orientation-based method for Unsupervised Domain Adaptation (GOUDA) to reduce this bias. To this end, we present a novel Triplet Selection algorithm with a curriculum learning framework, aiming to adapt the embedding space by pushing away samples of similar poses and bringing closer samples of different poses. We provide extensive experiments on four widely-used gait datasets, CASIA-B, OU-MVLP, GREW, and Gait3D, and on three backbones, GaitSet, GaitPart, and GaitGL, showing the superiority of our proposed method over prior works. |
2023-07-11 |
Efficient 3D Articulated Human Generation with Layered Surface Volumes |
Yinghao Xu et.al. |
2307.05462v1 |
null |
Access to high-quality and diverse 3D articulated digital human assets is crucial in various applications, ranging from virtual reality to social platforms. Generative approaches, such as 3D generative adversarial networks (GANs), are rapidly replacing laborious manual content creation tools. However, existing 3D GAN frameworks typically rely on scene representations that leverage either template meshes, which are fast but offer limited quality, or volumes, which offer high capacity but are slow to render, thereby limiting the 3D fidelity in GAN settings. In this work, we introduce layered surface volumes (LSVs) as a new 3D object representation for articulated digital humans. LSVs represent a human body using multiple textured mesh layers around a conventional template. These layers are rendered using alpha compositing with fast differentiable rasterization, and they can be interpreted as a volumetric representation that allocates its capacity to a manifold of finite thickness around the template. Unlike conventional single-layer templates that struggle with representing fine off-surface details like hair or accessories, our surface volumes naturally capture such details. LSVs can be articulated, and they exhibit exceptional efficiency in GAN settings, where a 2D generator learns to synthesize the RGBA textures for the individual layers. Trained on unstructured, single-view 2D image datasets, our LSV-GAN generates high-quality and view-consistent 3D articulated digital humans without the need for view-inconsistent 2D upsampling networks. |
2023-07-10 |
Active Learning for Video Classification with Frame Level Queries |
Debanjan Goswami et.al. |
2307.05587v1 |
null |
Deep learning algorithms have pushed the boundaries of computer vision research and have depicted commendable performance in a variety of applications. However, training a robust deep neural network necessitates a large amount of labeled training data, acquiring which involves significant time and human effort. This problem is even more serious for an application like video classification, where a human annotator has to watch an entire video end-to-end to furnish a label. Active learning algorithms automatically identify the most informative samples from large amounts of unlabeled data; this tremendously reduces the human annotation effort in inducing a machine learning model, as only the few samples that are identified by the algorithm, need to be labeled manually. In this paper, we propose a novel active learning framework for video classification, with the goal of further reducing the labeling onus on the human annotators. Our framework identifies a batch of exemplar videos, together with a set of informative frames for each video; the human annotator needs to merely review the frames and provide a label for each video. This involves much less manual work than watching the complete video to come up with a label. We formulate a criterion based on uncertainty and diversity to identify the informative videos and exploit representative sampling techniques to extract a set of exemplar frames from each video. To the best of our knowledge, this is the first research effort to develop an active learning framework for video classification, where the annotators need to inspect only a few frames to produce a label, rather than watching the end-to-end video. |
2023-07-07 |
A Self-Supervised Algorithm for Denoising Photoplethysmography Signals for Heart Rate Estimation from Wearables |
Pranay Jain et.al. |
2307.05339v1 |
null |
Smart watches and other wearable devices are equipped with photoplethysmography (PPG) sensors for monitoring heart rate and other aspects of cardiovascular health. However, PPG signals collected from such devices are susceptible to corruption from noise and motion artifacts, which cause errors in heart rate estimation. Typical denoising approaches filter or reconstruct the signal in ways that eliminate much of the morphological information, even from the clean parts of the signal that would be useful to preserve. In this work, we develop an algorithm for denoising PPG signals that reconstructs the corrupted parts of the signal, while preserving the clean parts of the PPG signal. Our novel framework relies on self-supervised training, where we leverage a large database of clean PPG signals to train a denoising autoencoder. As we show, our reconstructed signals provide better estimates of heart rate from PPG signals than the leading heart rate estimation methods. Further experiments show significant improvement in Heart Rate Variability (HRV) estimation from PPG signals using our algorithm. We conclude that our algorithm denoises PPG signals in a way that can improve downstream analysis of many different health metrics from wearable devices. |
2023-07-07 |
What makes a successful rebuttal in computer science conferences? : A perspective on social interaction |
Junjie Huang et.al. |
2307.03371v2 |
null |
With an exponential increase in submissions to top-tier Computer Science (CS) conferences, more and more conferences have introduced a rebuttal stage to the conference peer review process. The rebuttal stage can be modeled as social interactions between authors and reviewers. A successful rebuttal often results in an increased review score after the rebuttal stage. In this paper, we conduct an empirical study to determine the factors contributing to a successful rebuttal using over 3,000 papers and 13,000 reviews from ICLR2022, one of the most prestigious computer science conferences. First, we observe a significant difference in review scores before and after the rebuttal stage, which is crucial for paper acceptance. Furthermore, we investigate factors from the reviewer's perspective using signed social network analysis. A notable finding is the increase in balanced network structure after the rebuttal stage. Subsequently, we evaluate several quantifiable author rebuttal strategies and their effects on review scores. These strategies can help authors in improving their review scores. Finally, we used machine learning models to predict rebuttal success and validated the impact of potential factors analyzed in this paper. Our experiments demonstrate that the utilization of all features proposed in this study can aid in predicting the success of the rebuttal. In summary, this work presents a study on the impact factors of successful rebuttals from both reviewers' and authors' perspectives and lays the foundation for analyzing rebuttals with social network analysis. |
2023-07-06 |
Machine Learning Classification of Repeating FRBs from FRB121102 |
Bjorn Jasper R. Raquel et.al. |
2307.02811v2 |
null |
Fast Radio Bursts (FRBs) are mysterious bursts in the millisecond timescale at radio wavelengths. Currently, there is little understanding about the classification of repeating FRBs, based on difference in physics, which is of great importance in understanding their origin. Recent works from the literature focus on using specific parameters to classify FRBs to draw inferences on the possible physical mechanisms or properties of these FRB subtypes. In this study, we use publicly available 1652 repeating FRBs from FRB121102 detected with the Five-hundred-meter Aperture Spherical Telescope (FAST), and studied them with an unsupervised machine learning model. By fine-tuning the hyperparameters of the model, we found that there is an indication for four clusters from the bursts of FRB121102 instead of the two clusters ("Classical" and "Atypical") suggested in the literature. Wherein, the "Atypical" cluster can be further classified into three sub-clusters with distinct characteristics. Our findings show that the clustering result we obtained is more comprehensive not only because our study produced results which are consistent with those in the literature but also because our work uses more physical parameters to create these clusters. Overall, our methods and analyses produced a more holistic approach in clustering the repeating FRBs of FRB121102. |
2023-07-03 |
MWPRanker: An Expression Similarity Based Math Word Problem Retriever |
Mayank Goel et.al. |
2307.01240v1 |
null |
Math Word Problems (MWPs) in online assessments help test the ability of the learner to make critical inferences by interpreting the linguistic information in them. To test the mathematical reasoning capabilities of the learners, sometimes the problem is rephrased or the thematic setting of the original MWP is changed. Since manual identification of MWPs with similar problem models is cumbersome, we propose a tool in this work for MWP retrieval. We propose a hybrid approach to retrieve similar MWPs with the same problem model. In our work, the problem model refers to the sequence of operations to be performed to arrive at the solution. We demonstrate that our tool is useful for the mentioned tasks and better than semantic similarity-based approaches, which fail to capture the arithmetic and logical sequence of the MWPs. A demo of the tool can be found at https://www.youtube.com/watch?v=gSQWP3chFIs |
2023-06-30 |
Collapse of Straight Soft Growing Inflated Beam Robots Under Their Own Weight |
Ciera McFarland et.al. |
2307.00089v1 |
null |
Soft, growing inflated beam robots, also known as everting vine robots, have previously been shown to navigate confined spaces with ease. Less is known about their ability to navigate three-dimensional open spaces where they have the potential to collapse under their own weight as they attempt to move through a space. Previous work has studied collapse of inflated beams and vine robots due to purely transverse or purely axial external loads. Here, we extend previous models to predict the length at which straight vine robots will collapse under their own weight at arbitrary launch angle relative to gravity, inflated diameter, and internal pressure. Our model successfully predicts the general trends of collapse behavior of straight vine robots. We find that collapse length increases non-linearly with the robot's launch angle magnitude, linearly with the robot's diameter, and with the square root of the robot's internal pressure. We also demonstrate the use of our model to determine the robot parameters required to grow a vine robot across a gap in the floor. This work forms the foundation of an approach for modeling the collapse of vine robots and inflated beams in arbitrary shapes. |
2023-06-30 |
INDCOR White Paper 0: Interactive Digital Narratives (IDNs) -- A Solution to the Challenge of Representing Complex Issues |
Hartmut Koenitz et.al. |
2306.17498v1 |
null |
Citizens everywhere have the right to be well-informed. Yet, with the high complexity of many contemporary issues, such as global warming and migration, our means of information need to mutually adapt. Narrative has always been at the core of information exchange - regardless of whether our ancestors sat around a fire and exchanged stories, or whether we read an article in a newspaper, or watched a TV news broadcast. Yet, the narrative formats of the newspaper article, the news broadcast, the documentary, and the textbook are severely limited when it comes to representing highly complex topics which may include several competing - and sometimes equally valid - perspectives. Such complexity contributes to a high level of uncertainty due to a multitude of factors affecting an outcome. Fortunately, with Interactive Digital Narrative (IDN), there is a novel media format which can address these challenges. IDNs can present several different perspectives in the same work, and give audiences the ability to explore them at will through decision-making. After experiencing the consequences of their decisions, the audience can replay to revisit and change these decisions in order to consider their alternatives. IDN works enable deep personalization and the inclusion of live data. These capabilities make IDN a 21st century democratic medium, empowering citizens through the understanding of complex issues. In this white paper, we discuss the challenge of representing complexity, describe the advantages offered by IDNs, and point out opportunities and strategies for deployment. |
2023-06-30 |
24 New Light Curves and Updated Ephemeris using EXOTIC for WASP-12b |
Avinash S. Nediyedath et.al. |
2306.17473v2 |
null |
NASA citizen scientists from all over the world have used EXOplanet Transit Interpretation Code (EXOTIC) to reduce 71 sets of time-series images of WASP-12 taken by the 6-inch telescope operated by the Centre of Astrophysics |
2023-06-30 |
Leveraging Watch-time Feedback for Short-Video Recommendations: A Causal Labeling Framework |
Yang Zhang et.al. |
2306.17426v1 |
null |
With the proliferation of short video applications, the significance of short video recommendations has vastly increased. Unlike other recommendation scenarios, short video recommendation systems heavily rely on feedback from watch time. Existing approaches simply treat watch time as a direct label, failing to effectively harness its extensive semantics and introduce bias, thereby limiting the potential for modeling user interests based on watch time. To overcome this challenge, we propose a framework named Debiasied Multiple-semantics-extracting Labeling (DML). DML constructs labels that encompass various semantics by utilizing quantiles derived from the distribution of watch time, prioritizing relative order rather than absolute label values. This approach facilitates easier model learning while aligning with the ranking objective of recommendations. Furthermore, we introduce a method inspired by causal adjustment to refine label definitions, thereby reducing the impact of bias on the label and directly mitigating bias at the label level. We substantiate the effectiveness of our DML framework through both online and offline experiments. Extensive results demonstrate that our DML could effectively leverage watch time to discover users' real interests, enhancing their engagement in our application. |
2023-06-29 |
13 New Light Curves and Updated Mid-Transit Time and Period for Hot Jupiter WASP-104 b with EXOTIC |
Heather B. Hewitt et.al. |
2306.17251v1 |
null |
Using the EXOplanet Transit Interpretation Code (EXOTIC), we reduced 52 sets of images of WASP-104 b, a Hot Jupiter-class exoplanet orbiting WASP-104, in order to obtain an updated mid-transit time (ephemeris) and orbital period for the planet. We performed this reduction on images taken with a 6-inch telescope of the Center for Astrophysics |