PPoPP '16- Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Full Citation in the ACM Digital Library

SESSION: Applications

Coarse grain parallelization of deep neural networks

Marc Gonzalez Tallada

High performance model based image reconstruction

Xiao Wang
Amit Sabne
Sherman Kisner
Anand Raghunathan
Charles Bouman
Samuel Midkiff

Exploiting accelerators for efficient high dimensional similarity search

Sandeep R. Agrawal
Christopher M. Dee
Alvin R. Lebeck

SESSION: Language implementation and domain specific languages

Declarative coordination of graph-based parallel programs

Flavio Cruz
Ricardo Rocha
Seth Copen Goldstein

Distributed Halide

Tyler Denniston
Shoaib Kamil
Saman Amarasinghe

Parallel type-checking with haskell using saturating LVars and stream generators

Ryan R. Newton
Ömer S. Ağacan
Peter Fogg
Sam Tobin-Hochstadt

SESSION: Algorithms

Articulation points guided redundancy elimination for betweenness centrality

Lei Wang
Fan Yang
Liangji Zhuang
Huimin Cui
Fang Lv
Xiaobing Feng

Multi-core on-the-fly SCC decomposition

Vincent Bloemen
Alfons Laarman
Jaco van de Pol

A high-performance parallel algorithm for nonnegative matrix factorization

Ramakrishnan Kannan
Grey Ballard
Haesun Park

AUTOGEN: automatic discovery of cache-oblivious parallel recursive algorithms for solving dynamic programs

Rezaul Chowdhury
Pramod Ganapathi
Jesmin Jahan Tithi
Charles Bachmeier
Bradley C. Kuszmaul
Charles E. Leiserson
Armando Solar-Lezama
Yuan Tang

SESSION: GPUs and scheduling

Gunrock: a high-performance graph processing library on the GPU

Yangzihao Wang
Andrew Davidson
Yuechao Pan
Yuduo Wu
Andy Riffel
John D. Owens

GPU multisplit

Saman Ashkiani
Andrew Davidson
Ulrich Meyer
John D. Owens

Keep calm and react with foresight: strategies for low-latency and energy-efficient elastic data stream processing

Tiziano De Matteis
Gabriele Mencagli

Work stealing for interactive services to meet target latency

Jing Li
Kunal Agrawal
Sameh Elnikety
Yuxiong He
I-Ting Angelina Lee
Chenyang Lu
Kathryn S. McKinley

SESSION: Shared-memory data structures

Adding approximate counters

Guy L. Steele, Jr.
Jean-Baptiste Tristan

A wait-free queue as fast as fetch-and-add

Chaoran Yang
John Mellor-Crummey

Lease/release: architectural support for scaling contended data structures

Syed Kamran Haider
William Hasenplaugh
Dan Alistarh

SESSION: Optimistic concurrency

Optimistic concurrency with OPTIK

Rachid Guerraoui
Vasileios Trigonakis

Refined transactional lock elision

Dave Dice
Alex Kogan
Yossi Lev

Drinking from both glasses: combining pessimistic and optimistic tracking of cross-thread dependences

Man Cao
Minjia Zhang
Aritra Sengupta
Michael D. Bond

SESSION: Locking

Be my guest: MCS lock now welcomes guests

Tianzheng Wang
Milind Chabbi
Hideaki Kimura

Contention-conscious, locality-preserving locks

Milind Chabbi
John Mellor-Crummey

DomLock: a new multi-granularity locking technique for hierarchies

Saurabh Kalikar
Rupesh Nasre

SESSION: Consistency models

Benchmarking weak memory models

Carl G. Ritson
Scott Owens

The virtues of conflict: analysing modern concurrency

Ganesh Narayanaswamy
Saurabh Joshi
Daniel Kroening

Causal consistency: beyond memory

Matthieu Perrin
Achour Mostefaoui
Claude Jard

SESSION: Performance analysis and debugging

ESTIMA: extrapolating scalability of in-memory applications

Georgios Chatzopoulos
Aleksandar Dragojević
Rachid Guerraoui

Grain graphs: OpenMP performance analysis made easy

Ananya Muddukrishna
Peter A. Jonsson
Artur Podobas
Mats Brorsson

Production-guided concurrency debugging

Nuno Machado
Brandon Lucia
Luís Rodrigues

POSTER SESSION: Poster abstracts

Affinity-aware work-stealing for integrated CPU-GPU processors

Naila Farooqui
Rajkishore Barik
Brian T. Lewis
Tatiana Shpeisman
Karsten Schwan

An interval constrained memory allocator for the Givy GAS runtime

François Gindraud
Fabrice Rastello
Albert Cohen
François Broquedis

A programming system for future proofing performance critical libraries

Li-Wen Chang
Izzat El Hajj
Hee-Seok Kim
Juan Gómez-Luna
Abdul Dakkak
Wen-mei Hwu

A scalable lock-free hash table with open addressing

Jesper Puge Nielsen
Sven Karlsson

Concurrent hash tables: fast and general?(!)

Tobias Maier
Peter Sanders
Roman Dementiev

CUDA acceleration for Xen virtual machines in infiniband clusters with rCUDA

Javier Prades
Carlos Reaño
Federico Silla

Effect of portable fine-grained locality on energy efficiency and performance in concurrent search trees

Ibrahim Umar
Otto J. Anshus
Phuong H. Ha

Efficient distributed workstealing via matchmaking

Hrushit Parikh
Vinit Deodhar
Ada Gavrilovska
Santosh Pande

Data-centric combinatorial optimization of parallel code

Hao Luo
Guoyang Chen
Pengcheng Li
Chen Ding
Xipeng Shen

DSMR: a shared and distributed memory algorithm for single-source shortest path problem

Saeed Maleki
Donald Nguyen
Andrew Lenharth
María Garzarán
David Padua
Keshav Pingali

Generic messages: capability-based shared memory parallelism for event-loop systems

Luca Salucci
Daniele Bonetta
Stefan Marr
Walter Binder

Hybrid CPU-GPU scheduling and execution of tree traversals

Jianqiao Liu
Nikhil Hegde
Milind Kulkarni

Improving efficacy of internal binary search trees using local recovery

Arunmoezhi Ramachandran
Neeraj Mittal

Merge-based sparse matrix-vector multiplication (SpMV) using the CSR storage format

Duane Merrill
Michael Garland

NUMA-aware scheduling and memory allocation for data-flow task-parallel applications

Andi Drebes
Antoniu Pop
Karine Heydemann
Nathalie Drach
Albert Cohen

On designing NUMA-aware concurrency control for scalable transactional memory

Mohamed Mohamedin
Roberto Palmieri
Sebastiano Peluso
Binoy Ravindran

On ordering transaction commit

Mohamed M. Saad
Roberto Palmieri
Binoy Ravindran

OPR: deterministic group replay for one-sided communication

Xuehai Qian
Koushik Sen
Paul Hargrove
Costin Iancu

Preemption-aware planning on big-data systems

Marco Rabozzi
Matteo Mazzucchelli
Roberto Cordone
Giovanni Matteo Fumarola
Marco D. Santambrogio

Samsara parallel: a non-BSP parallel-in-time model

Yifeng Chen
Kun Huang
Bei Wang
Guohui Li
Xiang Cui

Scalable adaptive NUMA-aware lock: combining local locking and remote locking for efficient concurrency

Mingzhe Zhang
Francis C. M. Lau
Cho-Li Wang
Luwei Cheng
Haibo Chen

SPIRIT: a runtime system for distributed irregular tree applications

Nikhil Hegde
Jianqiao Liu
Milind Kulkarni

Tidex: a mutual exclusion lock

Pedro Ramalhete
Andreia Correia

Unifying fixed code and fixed data mapping of load-imbalanced pipelined loops

Aristeidis Mastoras
Thomas R. Gross

User-assisted storage reuse determination for dynamic task graphs

Mehmet Can Kurt
Bin Ren
Sriram Krishnamoorthy
Gagan Agrawal

Verification of MPI Java programs using software model checking

Waqas Ur Rehman
Muhammad Sohaib Ayub
Junaid Haroon Siddiqui