PPoPP '17- Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Full Citation in the ACM Digital Library

SESSION: Keynote

It's Time for a New Old Language

Guy L. Steele, Jr.

SESSION: Session 1: GPU I

EffiSha: A Software Framework for Enabling Effficient Preemptive Scheduling of GPU

Guoyang Chen
Yue Zhao
Xipeng Shen
Huiyang Zhou

Layout Lock: A Scalable Locking Paradigm for Concurrent Data Layout Modifications

Nachshon Cohen
Arie Tal
Erez Petrank

Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning

Xiuxia Zhang
Guangming Tan
Shuangbai Xue
Jiajia Li
Keren Zhou
Mingyu Chen

SESSION: Session 2: Concurrency

Checking Concurrent Data Structures Under the C/C++11 Memory Model

Peizhao Ou
Brian Demsky

An Efficient Abortable-locking Protocol for Multi-level NUMA Systems

Milind Chabbi
Abdelhalim Amer
Shasha Wen
Xu Liu

Contention in Structured Concurrency: Provably Efficient Dynamic Non-Zero Indicators for Nested Parallelism

Umut A. Acar
Naama Ben-David
Mike Rainey

Noise Injection Techniques to Expose Subtle and Unintended Message Races

Kento Sato
Dong H. Ahn
Ignacio Laguna
Gregory L. Lee
Martin Schulz
Christopher M. Chambreau

SESSION: Session 3: Tools

Thread Data Sharing in Cache: Theory and Measurement

Hao Luo
Pengcheng Li
Chen Ding

Exploiting Vector and Multicore Parallelism for Recursive, Data- and Task-Parallel Programs

Bin Ren
Sriram Krishnamoorthy
Kunal Agrawal
Milind Kulkarni

Isoefficiency in Practice: Configuring and Understanding the Performance of Task-based Applications

Sergei Shudler
Alexandru Calotoiu
Torsten Hoefler
Felix Wolf

Processor-Oblivious Record and Replay

Robert Utterback
Kunal Agrawal
I-Ting Angelina Lee
Milind Kulkarni

SESSION: Session 4: GPU II

Simple, Accurate, Analytical Time Modeling and Optimal Tile Size Selection for GPGPU Stencils

Nirmal Prajapati
Waruna Ranasinghe
Sanjay Rajopadhye
Rumen Andonov
Hristo Djidjev
Tobias Grosser

Combining SIMD and Many/Multi-core Parallelism for Finite State Machines with Enumerative Speculation

Peng Jiang
Gagan Agrawal

S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters

Ammar Ahmad Awan
Khaled Hamidouche
Jahanzeb Maqbool Hashmi
Dhabaleswar K. Panda

Model-based Iterative CT Image Reconstruction on GPUs

Amit Sabne
Xiao Wang
Sherman J. Kisner
Charles A. Bouman
Anand Raghunathan
Samuel P. Midkiff

SESSION: Session 5: Best Papers

Pagoda: Fine-Grained GPU Resource Virtualization for Narrow Tasks

Tsung Tai Yeh
Amit Sabne
Putt Sakdhnagool
Rudolf Eigenmann
Timothy G. Rogers

Groute: An Asynchronous Multi-GPU Programming Model for Irregular Computations

Tal Ben-Nun
Michael Sutton
Sreepathi Pai
Keshav Pingali

Tapir: Embedding Fork-Join Parallelism into LLVM's Intermediate Representation

Tao B. Schardl
William S. Moses
Charles E. Leiserson

A Multicore Path to Connectomics-on-Demand

Alexander Matveev
Yaron Meirovitch
Hayk Saribekyan
Wiktor Jakubiuk
Tim Kaler
Gergely Odor
David Budden
Aleksandar Zlateski
Nir Shavit

SESSION: Session 6: Languages & Compilers

SC-Haskell: Sequential Consistency in Languages That Minimize Mutable Shared Heap

Michael Vollmer
Ryan G. Scott
Madanlal Musuvathi
Ryan R. Newton

Synchronized-by-Default Concurrency for Shared-Memory Systems

Martin Bättig
Thomas R. Gross

Function Call Re-Vectorization

Rubens E.A. Moreira
Sylvain Collange
Fernando Magno Quintão Pereira

Optimizing the Four-Index Integral Transform Using Data Movement Lower Bounds Analysis

Samyam Rajbhandari
Fabrice Rastello
Karol Kowalski
Sriram Krishnamoorthy
P. Sadayappan

SESSION: Session 7: Data Analytics

Using Butterfly-Patterned Partial Sums to Draw from Discrete Distributions

Guy L. Steele, Jr.
Jean-Baptiste Tristan

KiWi: A Key-Value Map for Scalable Real-Time Analytics

Dmitry Basin
Edward Bortnikov
Anastasia Braginsky
Guy Golan-Gueta
Eshcar Hillel
Idit Keidar
Moshe Sulamy

Grammar-aware Parallelization for Scalable XPath Querying

Lin Jiang
Zhijia Zhao

Eunomia: Scaling Concurrent Search Trees under Contention Using HTM

Xin Wang
Weihua Zhang
Zhaoguo Wang
Ziyun Wei
Haibo Chen
Wenyun Zhao

SESSION: Session 8: Fault Tolerance

Self-Checkpoint: An In-Memory Checkpoint Method Using Less Space and Its Practice on Fault-Tolerant HPL

Xiongchao Tang
Jidong Zhai
Bowen Yu
Wenguang Chen
Weimin Zheng

Silent Data Corruption Resilient Two-sided Matrix Factorizations

Panruo Wu
Nathan DeBardeleben
Qiang Guan
Sean Blanchard
Jieyang Chen
Dingwen Tao
Xin Liang
Kaiming Ouyang
Zizhong Chen

POSTER SESSION: Session 9: Posters

POSTER: Reuse, don't Recycle: Transforming Algorithms that Throw Away Descriptors

Maya Arbel-Raviv
Trevor Brown

POSTER: An Architecture and Programming Model for Accelerating Parallel Commutative Computations via Privatization

Vignesh Balaji
Dhruva Tirumala
Brandon Lucia

POSTER: HythTM: Extending the Applicability of Intel TSX Hardware Transactional Support

Arnamoy Bhattacharyya
Mike Dai Wang
Mihai Burcea
Yi Ding
Allen Deng
Sai Varikooty
Shafaaf Hossain
Cristiana Amza

POSTER: Provably Efficient Scheduling of Cache-Oblivious Wavefront Algorithms

Rezaul Chowdhury
Pramod Ganapathi
Yuan Tang
Jesmin Jahan Tithi

POSTER: State Teleportation via Hardware Transactional Memory

Nachshon Cohen
Maurice Herlihy
Erez Petrank
Elias Wald

POSTER: IOGP: An Incremental Online Graph Partitioning for Large-Scale Distributed Graph Databases

Dong Dai
Wei Zhang
Yong Chen

POSTER: Distributed Control: The Benefits of Eliminating Global Synchronization via Effective Scheduling

Jesun Shariar Firoz
Thejaka Amila Kanewala
Marcin Zalewski
Martina Barnas
Andrew Lumsdaine

POSTER: MAPA: An Automatic Memory Access Pattern Analyzer for GPU Applications

Gangwon Jo
Jaehoon Jung
Jiyoung Park
Jaejin Lee

POSTER: Cache-Oblivious MPI All-to-All Communications on Many-Core Architectures

Shigang Li
Yunquan Zhang
Torsten Hoefler

POSTER: Automated Load Balancer Selection Based on Application Characteristics

Harshitha Menon
Kavitha Chandrasekar
Laxmikant V. Kale

POSTER: A GPU-Friendly Skiplist Algorithm

Nurit Moscovici
Nachshon Cohen
Erez Petrank

POSTER: Poor Man's URCU

Pedro Ramalhete
Andreia Correia

POSTER: A Wait-Free Queue with Wait-Free Memory Reclamation

Pedro Ramalhete
Andreia Correia

POSTER: STAR (Space-Time Adaptive and Reductive) Algorithms for Real-World Space-Time Optimality

Yuan Tang
Ronghui You

POSTER: Recovering Performance for Vector-based Machine Learning on Managed Runtime

Mingyu Wu
Haibing Guan
Binyu Zang
Haibo Chen

POSTER: On the Problem of Consistency Exceptions in the Context of Strong Memory Models

Minjia Zhang
Swarnendu Biswas
Michael D. Bond

POSTER: An Infrastructure for HPC Knowledge Sharing and Reuse

Yue Zhao
Chunhua Liao
Xipeng Shen