PPoPP '16- Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Full Citation in the ACM Digital Library
SESSION: Applications
Coarse grain parallelization of deep neural networks
Marc Gonzalez Tallada
High performance model based image reconstruction
Xiao Wang
Amit Sabne
Sherman Kisner
Anand Raghunathan
Charles Bouman
Samuel Midkiff
Exploiting accelerators for efficient high dimensional similarity search
Sandeep R. Agrawal
Christopher M. Dee
Alvin R. Lebeck
SESSION: Language implementation and domain specific languages
Declarative coordination of graph-based parallel programs
Flavio Cruz
Ricardo Rocha
Seth Copen Goldstein
Distributed Halide
Tyler Denniston
Shoaib Kamil
Saman Amarasinghe
Parallel type-checking with haskell using saturating LVars and stream generators
Ryan R. Newton
Ömer S. Ağacan
Peter Fogg
Sam Tobin-Hochstadt
SESSION: Algorithms
Articulation points guided redundancy elimination for betweenness centrality
Lei Wang
Fan Yang
Liangji Zhuang
Huimin Cui
Fang Lv
Xiaobing Feng
Multi-core on-the-fly SCC decomposition
Vincent Bloemen
Alfons Laarman
Jaco van de Pol
A high-performance parallel algorithm for nonnegative matrix factorization
Ramakrishnan Kannan
Grey Ballard
Haesun Park
AUTOGEN: automatic discovery of cache-oblivious parallel recursive algorithms for solving dynamic programs
Rezaul Chowdhury
Pramod Ganapathi
Jesmin Jahan Tithi
Charles Bachmeier
Bradley C. Kuszmaul
Charles E. Leiserson
Armando Solar-Lezama
Yuan Tang
SESSION: GPUs and scheduling
Gunrock: a high-performance graph processing library on the GPU
Yangzihao Wang
Andrew Davidson
Yuechao Pan
Yuduo Wu
Andy Riffel
John D. Owens
GPU multisplit
Saman Ashkiani
Andrew Davidson
Ulrich Meyer
John D. Owens
Keep calm and react with foresight: strategies for low-latency and energy-efficient elastic data stream processing
Tiziano De Matteis
Gabriele Mencagli
Work stealing for interactive services to meet target latency
Jing Li
Kunal Agrawal
Sameh Elnikety
Yuxiong He
I-Ting Angelina Lee
Chenyang Lu
Kathryn S. McKinley
SESSION: Shared-memory data structures
Adding approximate counters
Guy L. Steele, Jr.
Jean-Baptiste Tristan
A wait-free queue as fast as fetch-and-add
Chaoran Yang
John Mellor-Crummey
Lease/release: architectural support for scaling contended data structures
Syed Kamran Haider
William Hasenplaugh
Dan Alistarh
SESSION: Optimistic concurrency
Optimistic concurrency with OPTIK
Rachid Guerraoui
Vasileios Trigonakis
Refined transactional lock elision
Dave Dice
Alex Kogan
Yossi Lev
Drinking from both glasses: combining pessimistic and optimistic tracking of cross-thread dependences
Man Cao
Minjia Zhang
Aritra Sengupta
Michael D. Bond
SESSION: Locking
Be my guest: MCS lock now welcomes guests
Tianzheng Wang
Milind Chabbi
Hideaki Kimura
Contention-conscious, locality-preserving locks
Milind Chabbi
John Mellor-Crummey
DomLock: a new multi-granularity locking technique for hierarchies
Saurabh Kalikar
Rupesh Nasre
SESSION: Consistency models
Benchmarking weak memory models
Carl G. Ritson
Scott Owens
The virtues of conflict: analysing modern concurrency
Ganesh Narayanaswamy
Saurabh Joshi
Daniel Kroening
Causal consistency: beyond memory
Matthieu Perrin
Achour Mostefaoui
Claude Jard
SESSION: Performance analysis and debugging
ESTIMA: extrapolating scalability of in-memory applications
Georgios Chatzopoulos
Aleksandar Dragojević
Rachid Guerraoui
Grain graphs: OpenMP performance analysis made easy
Ananya Muddukrishna
Peter A. Jonsson
Artur Podobas
Mats Brorsson
Production-guided concurrency debugging
Nuno Machado
Brandon Lucia
Luís Rodrigues
POSTER SESSION: Poster abstracts
Affinity-aware work-stealing for integrated CPU-GPU processors
Naila Farooqui
Rajkishore Barik
Brian T. Lewis
Tatiana Shpeisman
Karsten Schwan
An interval constrained memory allocator for the Givy GAS runtime
François Gindraud
Fabrice Rastello
Albert Cohen
François Broquedis
A programming system for future proofing performance critical libraries
Li-Wen Chang
Izzat El Hajj
Hee-Seok Kim
Juan Gómez-Luna
Abdul Dakkak
Wen-mei Hwu
A scalable lock-free hash table with open addressing
Jesper Puge Nielsen
Sven Karlsson
Concurrent hash tables: fast
and
general?(!)
Tobias Maier
Peter Sanders
Roman Dementiev
CUDA acceleration for Xen virtual machines in infiniband clusters with rCUDA
Javier Prades
Carlos Reaño
Federico Silla
Effect of portable fine-grained locality on energy efficiency and performance in concurrent search trees
Ibrahim Umar
Otto J. Anshus
Phuong H. Ha
Efficient distributed workstealing via matchmaking
Hrushit Parikh
Vinit Deodhar
Ada Gavrilovska
Santosh Pande
Data-centric combinatorial optimization of parallel code
Hao Luo
Guoyang Chen
Pengcheng Li
Chen Ding
Xipeng Shen
DSMR: a shared and distributed memory algorithm for single-source shortest path problem
Saeed Maleki
Donald Nguyen
Andrew Lenharth
María Garzarán
David Padua
Keshav Pingali
Generic messages: capability-based shared memory parallelism for event-loop systems
Luca Salucci
Daniele Bonetta
Stefan Marr
Walter Binder
Hybrid CPU-GPU scheduling and execution of tree traversals
Jianqiao Liu
Nikhil Hegde
Milind Kulkarni
Improving efficacy of internal binary search trees using local recovery
Arunmoezhi Ramachandran
Neeraj Mittal
Merge-based sparse matrix-vector multiplication (SpMV) using the CSR storage format
Duane Merrill
Michael Garland
NUMA-aware scheduling and memory allocation for data-flow task-parallel applications
Andi Drebes
Antoniu Pop
Karine Heydemann
Nathalie Drach
Albert Cohen
On designing NUMA-aware concurrency control for scalable transactional memory
Mohamed Mohamedin
Roberto Palmieri
Sebastiano Peluso
Binoy Ravindran
On ordering transaction commit
Mohamed M. Saad
Roberto Palmieri
Binoy Ravindran
OPR: deterministic group replay for one-sided communication
Xuehai Qian
Koushik Sen
Paul Hargrove
Costin Iancu
Preemption-aware planning on big-data systems
Marco Rabozzi
Matteo Mazzucchelli
Roberto Cordone
Giovanni Matteo Fumarola
Marco D. Santambrogio
Samsara parallel: a non-BSP parallel-in-time model
Yifeng Chen
Kun Huang
Bei Wang
Guohui Li
Xiang Cui
Scalable adaptive NUMA-aware lock: combining local locking and remote locking for efficient concurrency
Mingzhe Zhang
Francis C. M. Lau
Cho-Li Wang
Luwei Cheng
Haibo Chen
SPIRIT: a runtime system for distributed irregular tree applications
Nikhil Hegde
Jianqiao Liu
Milind Kulkarni
Tidex: a mutual exclusion lock
Pedro Ramalhete
Andreia Correia
Unifying fixed code and fixed data mapping of load-imbalanced pipelined loops
Aristeidis Mastoras
Thomas R. Gross
User-assisted storage reuse determination for dynamic task graphs
Mehmet Can Kurt
Bin Ren
Sriram Krishnamoorthy
Gagan Agrawal
Verification of MPI Java programs using software model checking
Waqas Ur Rehman
Muhammad Sohaib Ayub
Junaid Haroon Siddiqui